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Abstract 

Single nucleotide polymorphisms (SNPs) are the most common type of genetic variations in humans and play a major role 
in the genetics of human phenotype variation and the genetic basis of human complex diseases. Recently, there is 
considerable interest in understanding the possible role of the CYP11B2 gene with corticosterone methyl oxidase 
deficiency, primary aldosteronism, and cardio-cerebro-vascular diseases. Hence, the elucidation of the function and 
molecular dynamic behavior of CYPnB2 mutations is crucial in current genomics. In this study, we investigated the 
pathogenic effect of 51 nsSNPs and 26 UTR SNPs in the CYP11B2 gene through computational platforms. Using a 
combination of SIFT, PolyPhen, l-Mutant Suite, and ConSurf server, four nsSNPs (F487V, V129l\/1, T498A, and V403E) were 
identified to potentially affect the structure, function, and activity of the CYP1 1 B2 protein. Furthermore, molecular dynamics 
simulation and structure analyses also confirmed the impact of these nsSNPs on the stability and secondary properties of 
the CYPl 1B2 protein. Additionally, utilizing the UTRscan, MirSNP, PolymiRTS and miRNASNP, three SNPs in the 3'UTR region 
were predicted to exhibit a pattern change in the upstream open reading frames (uORF), and eight microRNA binding sites 
were found to be highly affected due to 3'UTR SNPs. This cataloguing of deleterious SNPs is essential for narrowing down 
the number of CYP11B2 mutations to be screened in genetic association studies and for a better understanding of the 
functional and structural aspects of the CYP11B2 protein. 
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Introduction 

Single nucleotide polymorphisms (SNPs) are the most abundant 
class of genetic variations in the human genome with a frequency 
of approximately every 100 to 300 base pairs [1]. Given that there 
are millions of SNPs in the entire human genome, SNPs are 
important as markers for constructing genetic maps and have 
potential as direct functional variants associated with common and 
genetically complex diseases and drug responses. The vast 
majority of SNPs are neutral allelic variants; thus, one of the 
main goals of SNP research is the identification of functional 
SNPs, which is a crucial step for understanding the molecular basis 
of complex traits and diseases in humans [2]. However, the 
identification of these functional sets of SNPs may be a daunting 
task. Although experimental techniques will provide the strongest 
evidence for the functional role of a genetic variant [3], it is not 
feasible to perform laboratory experiments for all SNPs in the 
human genome or even in a single gene. Hence, theoretical and/ 
or computational methods are becoming indispensable for the 
identification and prioritization of SNPs with functional signifi- 
cance from an enormous number of non-risk alleles [4]. 
Computational methods are sufficiently fast and flexible to provide 
reliable predictions of functionally significant SNPs with a high 
accuracy of 80-85% [5-9] when combined with sequence, 
structure, and phylogenetic relationships. 



The aldosterone synthase (CYP11B2) gene is situated on 
chromosome 8q24.3 and encodes aldosterone synthase, which is 
the key rate-limiting enzyme for the terminal steps of aldosterone 
biosynthesis [10]. Previously, Strushkevich N and his research 
group determined the CYPl 1B2 structure by means of X-ray 
crystallography [1 1]. In recent years, there is considerable interest 
in understanding the possible role of the CYP11B2 gene for 
assessing the risk associated with corticosterone methyl oxidase 
deficiency (including CMO I and CMO II), primary aldosteron- 
ism, and cardio-cerebro-vascular diseases [12-17]. However, most 
disease association studies have focused on just a few SNPs, 
particularly T-344C (rsl799998). Other SNPs in the CYP11B2 
gene have not been studied, and the in silica investigations of 
SNPs in the CYP11B2 gene remain scarce. Lately, Hui E et al. 
described a 33-year old Chinese man who was compatible with 
type 2 aldosterone synthase deficiency carried a heterozygous 
mutation C.977C > A (p.Thr326Lys) in exon 3 and computational 
analysis also confirmed the missense variant nocuity [18]. Hence 
one can see that bioinformatics has its unique advantages in 
understanding the relationship between genes and diseases. In this 
study, we performed computational analyses of non-synonymous 
SNPs (nsSNPs) and UTR-region SNPs in the CYP11B2 gene to 
identify all of the possible deleterious mutations and propose a 
modeled structure for the mutant protein. We are confident that 
the results of our study will provide a further understanding of the 
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CYP11B2 gene in human diseases, as well as a guide for future 
experimental work. 

Materials and Methods 

Dataset collection 

The SNP information [SNP ID, amino acid position, mRNA 
accession number NM_000498.3, and Protein accession number 
NP_000489.3] of the human GYP11B2 gene used in our 
computational analyses was retrieved from the National Center 

for Biotechnology Information (NCBI) database of SNPs (dbSNP 
(http://ww\v.ncbi.nlm.nih.gov/snp/) [19]. The workflow, tools, 
and databases used to identify the potential functional SNPs in the 
human CYPl 1B2 gene are shown in Figure 1. 

Assessment of nsSNP functionality 

The functional context of nsSNPs was predicted using SIFT, 
PolyPhen and TMutant Suite. 

SIFT (http://sift.bii.a-star.edu.sg/index.html) is a sequence- 
homology-based tool to predict whether an amino acid substitu- 
tion in a protein would be tolerated or damaging [20]. We 
performed SIFT by submitting the query in the form of SNP IDs 
or chromosome positions and alleles in nsSNVs tool. Variants at 
the position with tolerance index score ^0.05 are considered to be 
deleterious. A lower tolerance index indicates that the particular 
amino acid substitution likely has a more functional impact 
[21,22]. 

PolyPhen (http://genetics.bwh.harvard.edu/pph2/) is an auto- 
matic tool that predicts the possible impact of an amino acid 
substitution on a number of features, including the sequence. 



phylogenetic, and structural information [23]. The quc.rf was 
submitted in the form of protein sequence with mutational position 
and substitution. The PolyPhen output comprises a score that 
ranges from 0 to 1 , with zero indicating a neutral effect of amino 
acid substitutions on protein function. Conversely, a high score 
represents a variant that is more likely to be damaging. 

I-Mutant Suite is a suite of support vector machine (SVM)- 
based predictors of protein stability changes according to Gibbs 
free energy change, enthalpy change, heat capacity change, and 
transition temperature [24]. The analyses were performed based 
on protein serjuence combined with mutational position and 
correlated new residue. And the output result of the predicted free 
energy change (DDG) classifies the prediction into one of three 
classes: largely unstable (DDG < —0.5 kcal/mol), largely stable 
(DDG>0.5 kcal/mol), or neutral (-0.5< DDG<0.5 kcal/mol). I- 
Mutant Suite is available at http:/ /gpcr2.biocomp.unibo.it/cgi/ 
predictors / I-Mutant3 . 0 / I-Mutant3 . 0 . cgi. 

Evolutionary conservation analysis of nsSNPs 

An amino acid that plays an essential role, e.g., in enzymatic 
catalysis, is likely to remain unaltered despite random evolutionary 
drift. Hence, the level of evolutionary conservation is often 
indicative of the importance of the position for maintaining the 
protein's structure and/ or function. The ConSurf server is a 
bioinformatics tool for estimating the evolutionary conservation of 
amino/ nucleic acid positions in a protein/DNA/RNA molecule 
based on the phylogenetic relationships between homologous 
sequences [25]. After entering the 3D structure of the query 
protein, the conservation scores are calculated based on the 
evolutionary relationships among the protein and its homologs 
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Figure 1. Workflow, tools, and databases used to identify potential functional SNPs in CYP11B2. 
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Figure 2. Distribution of SNPs. 

doi:1 0.1 371 /journal.pone.01 0431 1 .g002 



[26,27]. A conservation score between 1 and 4 is considered 
variable, whereas a score of 5-6 is intermediate, and a score in tlie 
range of 7 to 9 indicates conserved. Using the empirical Bayesian 
method, the accuracy of the conservation score estimation was 
significantly improved, particularly when a small number of 
sequences are used for the calculations [26]. ConSurf is available 
at http://consurftest.tau.ac.il. 

Evaluation of the functional context of SNPs in the UTR 
region 

The 5 'and 3' untranslated regions of eukaryotic mRNAs 
(UTRs) play crucial roles in the post-transcriptional regulation of 
gene expression through the modulation of nucleocytoplasmic 
mRNA transport, translation efficiency, subcellular localization, 
and message stability [28-30] . The functional impacts of UTR 
SNPs were analyzed using UTRScan [30], MirSNP [31], 
PolymiRTS [32] and miRNASNP[33]. 

The program UTRscan looks for UTR functional elements by 
searching through user submitted sequence data for the patterns 
defmed in the UTRsite collection. And UTRsite is a collection of 
regialatory elements located in the 5' and 3 'UTRs whose function 
and structure have been experimentally determined and pub- 
lished. If different sequences for each UTR SNP are found to have 
different functional patterns, that particular UTR SNP is predicted 
to have functional significance. The pattern change included two 
directions by the influence of SNPs at the UTR regions, either 
from "have pattern" to "no pattern", or "no pattern" to "have 



pattern". UTRscan is available at http:/ /itbtools.ba.itb.cnr.it/ 
utrscan. 

MirSNP is a database of SNPs used for the prediction of 
whether an SNP within the target site would decrease/break or 
enhance/create a microRNA-mRNA binding site based on 
information from dbSNP135 and miRBase 18. Its output of single 
search by entering the gene name includes mirSVR score, the 
effect of different alleles, the predicted score, conservative 
information and Start & End & Binding information. Combined 
with GWAS or eQTL data sets, MirSNP is highly sensitive and 
covers most experiments confirmed SNPs that affect miRNA 
function. MirSNP is available at http://cmbi.bjmu.edu.cn/ 
mirsnp. 

PolymiRTS is a database of naturally occurring DNA variations 
in microRNA seed regions and microRNA target sites. Integrated 
data from CLASH (cross linking, ligation and sequencing of 
hybrids) experiments, PolymiRTS database provides more com- 
plete and accurate microRNA-mRNA interactions. The poly- 
morphic microRNA target sites are assigned into four classes: 'D' 
(the derived allele disrupts a conserved microRNA site), 'N' (the 
derived allele disrupts a nonconserved microRNA site), 'C (the 
derived allele creates a new microRNA site) and 'O' (other cases 
when the ancestral allele cannot be determined unambiguously). 
The class 'C may cause abnormal gene repression and class 'D' 
may cause loss of normal repression control. So these two classes of 
PolymiRTS are most likely to have functional impacts. Poly- 
miRTS is available at http://compbio.uthsc.edu/miRSNP/. 
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Figure 3. Distribution of deleterious and benign nsSNPs by SIFT, PolyPhen, and l-Mutant Suite. The black rectangular bar indicates the 
percentage of nsSNPs that were found to be deleterious by SIFT, damaging (Possibly/Probably) by PolyPhen, and largely unstable by l-Mutant Suite. 
The white rectangle indicates the percentage of nsSNPs that were found to be tolerated by SIFT, benign by PolyPhen, and largely stable/neutral by I- 
Mutant Suite. 

doi:1 0.1 371/journal.pone.01 0431 1.g003 



miRNASNP is a database which predicts the effect (loss or gain 
of function) of SNPs within pre-miRNA, mature miRNA, miRNA 
target sequences and flanking regions. Using the SNP IDs of the 
query protein as an input, it produced a list of targets with energy 
change, SNP-miRNA/target duplexes and gain/loss effect by SNP 
in miRNA seed or gene 3'UTR. Focused on the prediction of 
potential effects on miRNA biogenesis and target binding by SNPs 
through both prediction and experimental validation, miRNASNP 
is a useful resource to shed light on further experiments. 
miRNASNP is available at http://www.bioguo.org/miRNASNP/. 



Molecular modeling and molecular dynamics simulation 

A structural analysis was performed to evaluate the structural 
stability of the native and mutant proteins. The crystal structure of 
the CYP11B2 protein was acquired from PDB [Protein Data 
Bank; PDB ID = 4DVQ, (A chain)] [34]. The Modeller 9.11 
package was used to map the mutations on the structure [35]. 
Furthermore, we used energy minimization and molecular 
dynamics simulation (MDS) techniques to understand the 
structural variations in the mutant protein with respect to the 
native structure using the NAMD 2.6 package [36]. The native 
and mutant protein structures were solvated in a water sphere 
using the VMD 1.9.1 package [37]. The cutoff for electrostatic and 



Table 2. Results of the evolutionary conservation analyses using the ConSurf server. 





Conservation score 


dbSNP 


Amino acid change 


SWiSS-PROT 


UniProt 


UniRef90 


rs200555543 


F499C 


6 


5 


5 


rsl 47547282 


Y275C 


5 


5 


4 


rs1 46655862 


V129M 


9 


9 


9 


rs72554626 


T498A 


7 


8 


8 


rs5317 


F487V 


7 


9 


9 


rs5315 


V403E 


9 


9 


8 



A conservation score between 1-4 is considered variable; 5-6 is intermediate; 7-9 is conserved. 
doi:1 0.1 371/journal.pone.010431 1 .t002 
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Figure 4. ConSurf output using the UniRef90 protein database. Colors of the ConSurf output indicate the level of sequence conservation. 
Purple indicates conservation and blue indicates variability. Residues are predicted to be exposed (e), buried (b), functional (i.e., highly conserved and 
exposed; f), or structural (i.e., highly conserved and buried, s). Numbers indicate residue number of CYP11B2. The bold (black) arrows represent 
V129M, Y275C, V403E, F487V, T498A and F499C mutation, respectively. 
doi:1 0.1 371 /journal.pone.01 0431 1 .g004 



Van der Waals interactions was 12.0 A. The temperature was 
maintained constant at 31 OK through the use of Langevin 
dynamics, which provides a means of controlling tlie kinetic 
energy of the system with a damping coefiicient (gamma) of 1/ps. 
The energy minimization and molecular dynamics simulations 
were performed using the CHARMM force field with 5000 
iterations and a 1-ns timescale, respectively. The trajectory files 
were analyzed to obtain the root-mean square deviation (RMSD), 
radius of gyration (Rg), and solvent-accessible surface area (SASA). 

Statistical analysis 

To determine the differences in the RMSD, Rg and SASA 
value between native and mutant protein structures, statistical 
analyses were performed with SAS 9.1 software (SAS Institute, 
Inc., Gary, NC). If quantitative data both fit the normal 
distribution and homogeneity of variance. Student's t-test was 
used to compare the differences between native and mutant group. 
Otherwise nonparametric WUcoxon two-sample test was used. 
The parameters were summarized by medians and interquartile 
ranges (IQRs). AH P-values are two-sided and less than 0.05 was 
considered a statistically significant difference. 

CYP11B2 database construction 

The database at http://203.81.25.54 contains the results 
obtained from this work. The natural variants listed in the 
database come from dbSNP. For each nsSNP, we provide 
predictions of the function effects using SIFT, PolyPhen-2, and 
I-Mutant Suite. Meanwhile, we also list the UTR SNPs that were 
predicted to have functional significance by MirSNP, polymiRTS 



and miRNASNP. In addition, PDB structure files of native and 
mutant proteins as well as results of molecular dynamics 
simulation can be downloaded. This database is freely available 
and wiU be regularly updated. 

Results 

SNP dataset from dbSNP 

The human CYP11B2 gene contains a total of 358 SNPs, of 
which 51 (14.2%) are nsSNPs and 36 (10.0%) are coding 
synonymous SNPs. The non-coding region includes 166 SNPs 
(46.4%) in the intronic region, 79 (22. 1 %) SNPs in the "near gene" 
region, and 26 SNPs (7.3%) in the mRNA UTR region. The 
distribution of SNPs is shown in Figure 2. We selected the nsSNPs 
and UTR-region SNPs for our subsequent investigations. 

Identification of deleterious and damaging nsSNPs 

The identification of the nsSNPs that confer susceptibility or 
resistance to human diseases should become increasingly feasible 
with improved in silico tools. In this analysis, we employed three 
in silico tools to determine the functional significance of nsSNPs in 
the CYPl 1B2 gene. Table 1 presents the results obtained through 
the SIFT, PolyPhen-2, and I-Mutant Suite analyses of the 
CYP11B2 nsSNPs. 

Through SIFT, 19 nsSNPs (37.3%) were predicted to be 
deleterious with a tolerance score of less than or equal to 0.05. 
Of these 19 SNPs, seven (R181W, F499C, Y275C, V129M, 
T185I, T498A, and V403E) were reported to be highly deleterious 
with a tolerance score of 0.00. 
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Table 3. The SNPs in the untranslated 
mIRNASNP. 


regions that were 


predicted to have functional significance by MirSNP, polymlRTS and 




SNPs Region Alleles 


MIrSNP 




polymiRTS 


mIRNASNP 


rsl 8878451 8 UTR-3 A/C 


hsa-miR-664-3p 




hsa-miR-664a-3p 


hsa-miR-664a-3p 


rsl 1791 0248 UTR-3 A/G 


hsa-miR-711 




hsa-miR-71 1 


hsa-miR-711 


rs61 763989 UTR-3 C/T 


hsa-miR-1914-3p 




hsa-miR-1914-3p 


hsa-miR-1914-3p 




hsa-miR-5194 




hsa-miR-5194 


hsa-miR-5194 




hsa-miR-423-5p 




h5a-miR-423-5p 


hsa-miR-423-5p 




hsa-miR-3184-5p 




hsa-miR-3184-5p 


hsa-miR-3184-5p 


hsa-miR-6738-5p 


hsa-miR-6762-5p 


hsa-miR-6845-5p 


rs61 757284 UTR-3 A/G 


hsa-miR-4432 




hsa-miR-4432 


hsa-miR-4432 


r528390200 UTR-3 C/T 


hsa-miR-5196-3p 




hsa-miR-5196-3p 


hsa-miR-5196-3p 




hsa-miR-3122 






hsa-miR-3122 




hsa-miR-3189-3p 






hsa-miR-3189-3p 




hsa-miR-500b 






hsa-miR-500b 




hsa-miR-3913-5p 






hsa-miR-3913~5p 


hsa-miR-362-5p 


rs7463238 UTR-3 A/G 


hsa-miR-1260a 




hsa-miR-1260a 


hsa-miR-1260a 




hsa-miR-1260b 




hsa-miR-1260b 


hsa-miR-1260b 




hsa-miR-4758-3p 




hsa-miR-4758-3p 


hsa-miR-4758-3p 








hsa-miR-3156-3p 


hsa-miR-3156-3p 


hsa-miR-188-3p 


hsa-miR-4258 


hsa-miR-1224-3p 


hsa-miR-7108-3p 


r53802228 UTR-3 A/G 


hsa-miR-331-5p 




hsa-miR-331-5p 


hsa-miR-331-5p 




hsa-miR-4678 




hsa-miR-4678 


hsa-miR-4678 


rs3097 UTR-3 A/G 


hsa-miR-4666b 




hsa-miR-4666b 


hsa-miR-4666b 


hsa-miR-299-3p 


doi:10.1371/journal.pone.0104311.t003 










We further analyzed the nsSNPs usin 


g PolyPheii based 


on 


With a diverse set of aUgnments 


and molecular characteristics of 



structural information and multiple sequence alignments. Of the 
51 nsSNPs used in our analysis, 14 nsSNPs were predicted to be 
"probably damaging", and nine nsSNPs were found to be 
"possibly damaging". Consequently, 23 nsSNPs (45.1%) were 
characterized as damaging. 

To improve the prediction accuracy of structure-based tools, we 
then used I-Mutant Suite. We found that 24 nsSNPs (47.1%) 
exhibit a DDG value of less than —0.5, which indicates that these 
are largely unstable. 

The predictive power of determining the functional impact of a 
given nsSNP can be significantly increased by combining 
information from a variety of tools [38]. Accordingly, we 
combined the SIFT, PolyPhen, and I-Mutant Suite programs to 
predict the influence of nsSNPs on protein function and structure. 
Figure 3 shows the distribution of deleterious and benign nsSNPs 
obtained using SIFT, PolyPhen, and I-Mutant Suite. Of all of the 
predictions, 37.3%, 45.1%, and 47.1% were specific found by 
SIFT, PolyPhen, and I-Mutant Suite, respectively. In addition, six 
nsSNPs (F499C, Y275C, V129M, T498A, F487V, and V403E) 
were predicted to be functionally significant by all three tools. 



each in silico tool, the results of three tools were slighdy different. 

Analysis of nsSNPs in the conserved region 

A disease-causing mutation often resides in highly conserved 
positions. Conservation analyses of the six nsSNPs that were 
predicted to be deleterious by the above-mentioned three tools 
were performed using the ConSurf server based on protein 
structure. Of the six nsSNPs, the four nsSNP positions of V129M, 
T498A, F487V, and V403E were considered to be located in a 
highly conserved amino acid region through homologous sequence 
alignment with the SWISS-PROT, UniProt, and UniRef90 
protein databases. The main results are shown in Table 2 and 
Figure 4. 

Functional SNPs in the UTR region 

UTRs are known to play vital roles in the post-transcriptional 
regulation of gene expression, and their importance is emphasized 
by the finding that UTR variations can lead to serious pathology 
[39]. All of the 26 UTR SNPs were analyzed using UTRscan. 
After comparing the functional elements for each UTR SNP, we 
predicted that three SNPs, namely rs6 1763988, rs35574522, and 
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Figure 5. Backbone RMSDs for the native and mutant CYP1 1 B2 protein structures. The ordinate is RMSD (A), and the abscissa is time (psj. 
Black, blue, green, violet and red lines indicate native, V129IV1, V403E, F487V and T498A mutation, respectively. 
doi:1 0.1 371/journal.pone.01 0431 1 .g005 



rs3097, exhibited a pattern change of upstream open reading 
frame (uORF). Considering the extensive role of UTR SNPs in 
microRNA binding sites, which could affect the degradation or 
translational suppression of mRNA, we further analyzed the UTR 
SNPs by MirSNP, PolymiRTS and miRNASNP. The results 
showed that 19 SNPs were predicted to change the binding sites 
with microRNAs by MirSNP and miRNASNP. In PolymiRTS, 1 1 
SNPs were found to highly alfect the microRNA binding targets. 
Then combined the results of these three tools, eight SNPs 
(rsl88784518, rsll7910248, rs61763989, rs61757284, rs28390200, 
rs7463238, rs3802228 and rs3097) indicate a highest likelihood that 
the polymorphism significantly altered microRNA targeting of the 
sequence (Table 3). 

Molecular dynamics simulation of native and mutant 
CYP11B2 proteins 

To further understand the structural consequences of the 
prioritized deleterious mutations, molecular dynamics simulations 



were conducted to analyze the conformational changes in the 
native and mutant structures (V129M, V403E, F487V, and 
T498A). The trajectory files were produced after the molecular 
dynamics simulation, and we then investigated the RMSD, Rg, 
and SASA variations between the native and the four mutant 
structures. 

We calculated the RMSD for all the atoms from the initial 
structure that was considered as the central origin to measure the 
convergence of the protein system concerned (Figure 5). In all five 
structures, considerable structural changes were observed during 
the initial few picoseconds, leading to an RMSD of ~ 1 .2 A and 
subsequently notable structural deviations during the rest of the 
simulations. In the last 200 picoseconds of the simulation, the 
median of RMSD is 1.21(IQR:1.18-1.26) A for native structure, 
1.46(IQR: 1.36-1. 51) A for V129M, 1.40(IQR: 1.37-1. 43) A for 
V403E, 1.82(IQR:1. 79-1.86) A for F487V, and 1.47(IQR:1.41- 
1.50) A for T498A (Table 4). The statistical analysis showed 
significant differences between the native structure and the four 



Table 4. Data analyses of last 200 picoseconds of the simulation in RMSD, Rg and SASA. 






native 


F487V 


V129M 


T498A 


V403E 


RMSD (A) 


Median(Ql-Q3) 


1.21(1.18-1.26) 


1.82(1.79-1.86) 


1.46(1.36-1.51) 


1.47(1.41-1.50) 


1.40(1.37-1.43) 


P value 




<.0001 


<.0001 


<.0001 


<.0001 


Rg (A) 


Median(Ql-Q3) 


22.32(22.29-22.35) 


22.58(22.55-22.61) 


22.40(22.37-22.43) 


22.37(22.34-22.39) 


22.32(22.29-22.35) 


P value 




<.0001 


<.0001 


<.0001 


0.8932 


SASA (nm^) 


Median(Ql-Q3) 


24896(24830-24980) 


24993(24931-25058) 


24821 (24753-24895) 


24719(24667-24778) 


24880(24827-24934) 


P value 




<.0001 


<.0001 


<.0001 


0.0001 


doi:l 0.1 371 /journal.pone.Ol 0431 1 .1004 
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Figure 6. Radius of gyration of Ca atoms of the native and mutant CYP1 1 B2 proteins. The ordinate is Rg (A), and the abscissa is time (ps). 
Black, blue, green, violet and red lines indicate native, V129iVl, V403E, F487V and T498A mutation respectively. 
doi:1 0.1 371 /journal.pone.01 0431 1 .g006 



mutant structures (P<0.0001, particularly F487V). Moreover, 
small fluctuations in the average RMSD value after the relaxation 
period led to the conclusion that the simulation generated a stable 
trajectory and thus provides a credible basis for further analyses. 

Rg is defined as the mass-weight root mean square distance of a 
collection of atoms from their common center of mass. Hence, it 



provides insight into the overall dimension of a protein. The Rg 
plot for the Cot atoms of the protein as a function of time at 310 K 
is shown in Figure 6 and results of data analyses are shown in 
Table 4. The statistic analysis of Rg value of the last 200 
picoseconds of the simulation showed that F487V, V129M and 
T498A had significant differences with native structure [native: 
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Figure 7. Solvent-accessible surface area (SASA) of the native and mutant CYP11B2 proteins. The ordinate is SASA (nm^), and the 
abscissa is time (ps). Black, blue, green, violet and red lines indicate native, V129M, V403E, F487V and T498A mutation, respectively. 
doi:1 0.1 371/journal.pone.01 0431 1.g007 
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Figure 8. Spatial superimposition of tKie native and mutant CYP11B2 proteins. Residues with a low displacement (0 A) are shown in blue, 
those with a high displacement (5 A) are shown in red, and those with a moderate displacement are shown in white. The CYP11B2 models are 
represented in NewCartoon, and the mutated amino acids are represented in CPK. 
doi:1 0.1 371/journal.pone.01 0431 l.gOOS 



22.32(IQR: 22.29-22.35) A; V129M: 22.40(IQR: 22.37-22.43) A; 
F487V: 22.58(IQR: 22.55-22.61) A; T498A: 22.37(IQR: 22.34- 
22.39) A]. As reflected in Figure 6, tlie F487V mutant curve 
differed significantly and fluctuated at a higher rate during the 
simulation time period, indicating that the mutant conformation is 
flexible throughout the simulation time and that its structure 
acquires an expanded conformation compared to the native 
structure. On the contrary, no difference was found between the 
native structure and V403E structure. 

The SASA is the surface area of a biomolecule that is accessible 
to a solvent and can be related to the hydrophobic core. It is 
typically calculated using the VoUing ball' algorithm developed by 



Shrake and Rupley in 1973 [40]. The SASA was calculated for 
native and mutant trajectories and is depicted in Table 4 and 
Figure 7. Data analyses showed that there were significant 
differences between all four mutant structures and native structure 
[native: 24896(IQR: 24830-24980) nm^; V129M: 24821(IQR: 
24753-24895) nm^; V403E: 24880(IQR: 24827-24934) nm^; 
F487V: 24993(IQR: 24931-25058) nm^; T498A: 24719(IQR: 
24667-24778) nm^]. Compared with the native protein, the 
F487V mutant protein exhibited a greater value of SASA over 
time, whereas V129M, V403E and T498A presented lower SASA 
values. An increase or decrease in SASA indicates changes in the 



Table 5. Ranking SNPs based on molecular dynamics simulation. 





dbSNP 


AA change 


RMSD 


Rg 


SASA 


spatial superimposition 


rs5317 


F487V 


++ 


++ 


+ 


+ 


rs 146655862 


V129M 


+ 


+ 


+ 


+ 


rs72554626 


T498A 


+ 


+ 


+ 




rs5315 


V403E 


+ 




+ 





doi:l 0.1 371 /journal.pone.Ol 0431 1 .t005 
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exposed amino acid residues and could affect the tertiary structure 
of the protein. 

To properly visualize the crystal structure differences between 
the native and mutant proteins, we spatially superimposed the 
molecules (Figure 8). The results show that F487V and V129M 
exhibit a high displacement (5 A; shown in red) and that T498A 
and V403E present a low displacement (0 A; shown in blue). 

Furthermore, we ranked above four SNPs based on results of 
RMSD, Rg, SASA variations and spatial superimposition 
(Table 5). So F487V had the highest likelihood of deleterious 
effect, then V129M, T498A, and V403E with descending 
perniciousness. 

CYP11B2 database 

During the execution of this project, the CYP11B2 database 
was created to show a more updated and complete set of in silico 
analyses per mutation. This database allows a user to quickly 
retrieve and rapidly analyse the predicted effects of protein 
variants. With its interactive interface, the CYP11B2 database 
allows dynamic utilization by enabling users to select only the 
results of the mutations and algorithms that are most important to 
them. The in silico analysis of CYPl 1B2 in this database wiU be 
helpful in the design of further experimental research. The 
CYP11B2 database is available at http://203.81.25.54/. 

Discussion 

Because of the apphcation of high-throughput sequencing 
technologies, the number of identified genomic variants, particu- 
larly SNPs, in the human genome is rapidly growing. The latest 
release of NCBI dbSNP database (buUd 141) contains nearly 44 
million validated human SNPs [19]. The principal objective of 
studies in molecular biology and population genetics is to identify 
and characterize SNPs that are functionally deleterious from 
neutral SNPs. This is also an inevitable process in genetic 
association studies of complex genes and diseases [41]. To the 
best of our knowledge, this study provides the first demonstration 
of the computational analysis of functional SNPs associated with 
the CYP11B2 gene. The value and novelty of this study are to 
prioritize SNPs with functional significance from an enormous 
number of non-risk alleles and provide new insights for further 
genetic association studies. Moreover, these identified SNPs could 
contribute to aldosterone-induced cardiovascular disease, possibly 
representing novel targets for the therapy. Of 358 SNPs, we 
selected the nsSNPs and UTR-region SNPs for our investigations, 
and variants in near-Gene, intronic regions were unexplored. 

In this study, we attempted to evaluate the deleterious nsSNPs 
in three contexts: (1) Identification of deleterious nsSNPs through 
both sequence- and structure-based methods (SIFT, PolyPhen and 
TMutant Suite), (2) Calculation of the evolutionarv' conservation of 
amino acid positions through a conservation score (ConSurf 
ser\'(;r), and (3) Mc'asurcmcnt of alterations in the protein 3D 
structure due to deleterious nsSNPs through a molecular dynamics 
approach. Of the 51 nsSNPs associated with the CYP11B2 gene, 
four nsSNPs, namely F487V, V129M, T498A, and V403E, were 
finally identified to be highly deleterious based on above 
comprehensive analyses, particularly F487V. 

A number of recent studies mainly focused on the T-344C 
polymorphism, which impacts the CYPl 1B2 promoter activity, 
but the literature on coding substitutions that directiy influence the 
structure of the protein is scarce. However, T498A, one of four 
above-mentioned nsSNPs that were predicted to be deleterious, 
was found to be strongly associated with CMO-II deficiency, 
which shows very low levels of aldosterone synthesis (0.5% or less 



compared with the wUdtype enzyme). The in vitro analysis of the 
enzyme activities of the T498A mutation showed efficient 1 1 [5- 
hydroxylase activity but a loss of Cjg activity, resulting in poor 
aldosterone synthesis [41]. Hence, it appears reasonable to 
speculate that nsSNPs can ruin the secondary structure of the 
enzyme, thereby leaving the aldosterone synthase activity intact. It 
is worth noting that some patients, such as CMO-II deficiency 
patients who reach adulthood, could be asymptomatic and able to 
synthesize adequate amounts of aldosterone at the expense of 
elevated levels of aldosterone precursors. This existence of 
ostensibly asymptomatic individuals with significantiy comjjro- 
mised aldosterone synthase function may reflect problems of 
ascertainment and may at least partly explain why few coding 
mutations in the CYPl 1B2 gene have been reported. 

Because the translational regulation of gene expression is as 
important as the transcriptional regulation for normal cell function 
and that its dysfunction is related to the pathophysiology of various 
diseases [42^4], the UTR SNPs in the CYPl 1B2 gene were also 
evaluated by UTRScan, MirSNP, PolymiRTS and miRNASNP. 
In our study, we found that 7.3% of the SNPs are loc:ated in the 
UTR region. After comparing the functional elements for each 
UTR SNP using UTRscan, we found tiiat three SNPs in tiie 
3 'UTR were predicted to exhibit a pattern change in their 
upstream open reading frames (uORFs). However, the uORF in 
the 3'UTR is hypothesized to have no functional importance. 

Due to the importance of the translational regulation of 
microRNAs, we further studied whether the 3'UTR SNPs change 
the profile of microRNA binding to the CYP11B2 gene using 
MirSNP, PolymiRTS and miRNASNP. Of die 26 UTR SNPs, 
eight (rsl88784518, rsl 17910248, rs61763989, rs61757284, 
rs28390200, rs7463238, rs3802228 and rs3097) were found to 
highly affect the microRNA binding targets with MirSNP, 
PolymiRTS and miRNASNP. These SNPs can break, create, 
enhance, or decrease microRNA binding (i.e., a single SNP can 
break a microRNA binding site and also potentially create another 
site), with consequences on regulation of mRNA degradation 
pathway thereby affecting mRNA turnover and microRNA 
function. Therefore, these UTR SNPs could result in the 
disturbance of aldosterone biosynthesis. Recentiy, mounting 
evidence suggests that aldosterone plays crucial roles in a variety 
of cerebro-, cardiovascular and renal complications [45]. Never- 
theless, validation and pathomechanism experiments of these 
predicted deleterious UTR SNPs were still few. Several studies 
indicated that rs3802228 might be associated with atrial structural 
remodeling and the presence of coronary artery disease [46,47]. As 
reflected in Table 3, rs3802228 c:ould disturb the interactions 
between mRNA and microRNA-331-5p. Consistent with this 
idea, one recent study comes to demonstrate that the upregulation 
of rno-miR-331* could be seen as biomarkers of prognosis in 
cfinical dierapy of heart failure [48]. Besides, rs3097 (G5937C), 
one of above eight detrimental SNPs, was also found to be 
associated with cardiac wall thickness [49]. Collectively, these facts 
and speculations suggest that a potential role of these identified 
UTR SNPs in the pathogenesis of aldosterone-induced cardiovas- 
cular compUcations. Then, it is of considerable interest that the 
pathogeny of some cardiovascular disease but not limited to 
primary aldosteronism could be the variants in the CYPl 1B2 
gene, and aldosterone may act as a central player in this 
pathological process. Thereby, aldosterone antagonist treatment 
seems to be of considerable therapeutic value to control and limit 
the progression of these diseases. This newly pathway of CYPl 1B2 
SNPs/aldosterone/cardiovascular disease opens new research 
insights and therapeutic avenues for the cardiovascular diseases. 
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CYP11B2 protein is a steroid hydroxylase cytochrome P450 
enzyme involved in the biosynthesis of the mineralocorticoid 
aldosterone. It is the sole enzyme capable of synthesizing 
aldosterone in humans and plays an important role in electrolyte 
balance and blood pressure. Mutations in the CYPl 1B2 gene can 
disturb the liiosynthesis of aldosterone, then resulting in aldoste- 
rone synthase deficiency, also known as corticosterone methylox- 
idase deficiency. Besides, CYPl 1B2 gene variations can also 
change the gene expression, therefore play an important role in 
many diseases, such as hypertension, primary aldosteronism and 
heart failure. In addition, Nicod eA al. found that CYPl 1B2 is also 
strongly associated with the rate of decline in renal allograft 
function [50]. Our in silica studies identified various deleterious 
SNPs, and majority of them have not been reported experimen- 
tally so far. However, these findings highlight an attractive 
screening target for disease association studies involved in 
C YP 11 B2 protein, and also provide a guide for future experi- 
mental work. 

Although the prediction of deleterious SNPs seems to be more 
and more accurate when integrating more valuable informations, 
there still exist some challenges to deal with. Computational tools 
can predict a variant is deleterious or not with a strong confidence, 
but the information about which disease the variant is related to 
and which disease the variant has a casual relation with is still 
missing [51]. In addition, facts show that variants in regulatory 
regions may alter the consensus of transcription factor binding sites 
or promoter elements; variants in the introns and silent variants in 
exons may alter splicing efficiency. Nevertheless, prediction of 
these variants from genomic sequence remains one of the most 
challenging tasks for bioinformatics. The biggest problem is over- 
prediction: (1) the prediction of promoter was expressed crypti- 
cally; (2) the vast majority of transcription factor binding sites lack 
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characteristics either in length or sequence; (3) cis-regulatory 
elements, such as ESE (exonic splicing enhancers), ESS (exonic 
splicing sUencers), ISE (intronic splicing enhancers) and ISS 
(intronic splicing silencers) sites are very poorly defined and may 
be located in almost any position within exons and introns. For 
these reasons, we currently did not perform the prediction of 
variants in near-Gene, intronic regions. 

In summary, using combinational in silico investigations, the 
current study identified four nsSNPs, denoted F487V, V129M, 
T498A, and V403E, as deleterious to the structure and function of 
the CYP11B2 gene. The molecular dynamics simulation analyses 
also confirmed that the four nsSNPs that were predicted to be 
deleterious may induce changes in the stability of the protein by 
altering the RMSD, Rg, and SASA. In addition, three SNPs in the 
3'UTR were predicted to influence the translation pattern of the 
CYP11B2 gene through UTRst:an analysis, and eight 3'UTR 
SNPs may affect microRNA binding sites, as determined through 
MirSNP, PolymiRTS and miRNASNP analyses. Altered 
CYP11B2 function due to mutations and protein expression may 
play a critical role in determining susceptibility to complex 
diseases. This cataloguing of deleterious SNPs is essential for 
narrowing down the number of CYPl 1B2 mutations to be 
screened in genetic association studies and for a better under- 
standing of the functional and structural aspects of the CYP11B2 
protein. 

Author Contributions 

Concci\'cd and designed the expcrinieiils: MJ XS WG. Performed the 
experiments: MJ BY ZL. Analyzed the data: MJ HS. Contributed 
reagents/materials/analysis tools: XS WG. Wrote the paper: MJ XS WG. 



involvement of cystatin-C in untreated hypertension. Am J Hypertens 26: 683- 
690. 

16. Ji P.Jiang L, Zhang S, Ciii W. Zhang D, et al. (2013) Aldosterone Synthase 
Gene (CYP11B2) — 344C/r Polymorphism Contributes to the Risk of 
Recurrent Cerebral Ischemia, (ienet Test Mol Biomarkers 17: 548—552. 

17. Androulakis E, Tousoulis D, Papageorgiou N, Miliou A, Chatzistamatiou E, 
et al. (2013) Effects of the C-344T aldosterone synthase gene variant on 
preclinical vascular alterations in essential hypertension. Int J Cardiol 168: 
1605-1606. 

18. Hui E, Yeung MC, Cheung PT, Kwan E, Low L, et al. (2014) The clinical 
significance of aldosterone synthase deficiency: report of a novel mutation in the 
CYP11B2 gene. BMC Endocr Disord 14: 29. 

19. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, et al. (2001) dbSNP: die 
NCBI database of genetic variation. Nucleic Acids Res 29; 308-311. 

20. Kumar P, Henikoflf S, Ng PC (2009) Predicting die eifects of coding non- 
synonymous variants on protein function using the SIFT algorithm. Nat Protoc 
4; 1073-1081. 

21. Ng PC, Henikolf S (2001) Predicting deleterious amino acid substitutions. 
Genome Res 11: 863-874. 

22. Ng PC, Henikoff S (2003) SlFl': Predicting amino acid changes that aficct 
protein function. Nucleic Acids Res 31: 3812-3814. 

23. Adzhubei lA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, et al. (2010) 
A method and server for predicting damaging missense mutations. Nat Methods 
7: 248-249. 

24. Capriotti E, Calabrese R, Casadio R (2006) Predicting the insurgence of human 
genetic diseases associated to single point protein mutations with support vector 
machines and evolutionary information. Bioinformatics 22: 2729-2734. 

25. Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, et al. (2003) ConSurf 
identification of functional regions in proteins b\' surface-mapping of 
ph\logeiietie information. Bioinformatics 19: 103 llil. 

26. Mayrose I, Graur D, Ben-Tal N, Pupko T (2004) Comparison of sitc-spccific 
rate-inference methods for protein sequences: empirical Bayesian methods are 
superior. Mol Biol Evol 21: 1781-1791. 

27. Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an 
algorithmic tool for the identification of functional regions in proteins by surface 
mapping of evolutionary determinants within their homologues. Bioinformatics 
18 Suppl 1: S71-77. 

28. Mignone F, Gissi C, Liuni S, Pesole G (2002) Untranslated regions of mRNAs. 
Genome Biol 3: REVIEWS0004. 



PLOS ONE I www.plosone.org 



13 



August 2014 I Volume 9 | Issue 8 | e104311 



Computational Analysis of SNPs in CYP11B2 Gene 



29. Rynt AS, Lai EC (2008) Biological principles of microRNA-mcdiated regulation: 
shared themes amid diversity. Nat Rev Genet 9: 831—842. 

30. Grillo G, Turi A, licciulli F, Mignone F, liuni S, et al. (2010) UTRdb and 
UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of 
the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res 38: D75-8Q. 

31. Liu C, Zhang F, Li T, Lu M, Wang L, et al. (2012) MirSNP, a database of 
polvmorphisms altering miRNA target sites, identifies miRNA-related SNPs in 
GWAS SNPs and cQTLs. BMC Genomics 13: 661. 

32. Bhattachar)-a A, ZicbarthJD, Cui Y (2013) PolymiRTS Database 3.0: linking 
polvmorphisms in microRNAs and their target sites with human diseases and 
biological pathways. Nucleic Acids Research 42: D86-D91. 

33. Gong J, Tong Y, Zhang H-M, Wang K, Hu T, et al. (2012) Genome-wide 
identification of SNPs in microRNA genes and the SNP effects on microRNA 
target binding and biogenesis. Human Mutation 33: 254—263. 

34. Strushkevich N, Gilep AA, Shen L, Arrowsmith CH, Edwards AM, cl al. (2013) 
Structural insights into aldosterone svnthase substrate specificity and targeted 
inhibition. Moi Endocrinol 27: 315 324. 

35. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, et al. 
(2006) Comparative protein structure modeling using Modeller. Curr Protoc 
Bioinformatics Chapter 5: Unit 5 6. 

36. Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, et al. (2005) Scalable 
molecular dynamics with NAMD. J Comput Chem 26: 1781-1802. 

37. Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J 
Mol Graph 14: 33-38, 27-38. 

38. Rajith B, George Priya Doss C (2011) Path to facilitate the prediction of 
functional amino acid substitutions in red blood cell disorders— a computational 
approach. PLoS One 6: c24607. 

39. Conne B, Stutz A, VassalliJD (2000) The 3' untranslated region of messenger 
RNA: A molecular 'hotspot' for pathology? Nat Med 6: 637-641. 



40. Shrake A, Rupley JA (1973) Environment and exposure to solvent of protein 
atoms. Lysozymc and insulin. J Mol Biol 79: 351-371. 

41. Zhu M, Zhao S (2007) Candidate gene identification approach: progress and 
challenges. Int J Biol Sci 3: 420-427. 

42. Cazzola M, Skoda RC (2000) Translational pathophysiology: a novel molecular 
mechanism of human disease. Blood 95: 3280-3288. 

43. Reynolds PR (2002) In sickness and in health: the importance of translational 
regulation. Arch Dis Child 86: 322-324. 

44. Scheper GC, van dcr Knaap MS, Proud CG (2007) Translation matters: protein 
synthesis defects in inherited disease. Nat Rev Genet 8: 71 1-723. 

45. Quinkler M, Born-Erontsberg E, Eourkiotis VG (2010) Comorbidities in primary 
aldosteronism. Horm Mctab Res 42: 429 434. 

46. Huang H, Zhang E, Eiu R, Chen YC.^, Ei X, et al. (201 1) Polymorphisms within 
micro-RNA-binding sites and risk of coronar\' artery disease in Chinese: an 
angiography-bascd study. Eur Heart J 32: 355—355. 

47. Cao EE, Chen XD, Wang QS, Li L, Wang XF, et al. (2009) [Associations of the 
genetic polymorphisms in CYP11B2 gene with nonfamilial structural atrial 
fibrillation]. Zhonghua Liu Xing Bing Xue Za Zhi 30: 1069-1072. 

48. Feng HJ, Ouyang W, Liu JH, Sun YG, Hu R, et al. (2014) Global microRNA 
profiles and signaling pathways in the development of cardiac h\']")ertrophy. 
BrazJ Med Biol Res 0: 0. 

49. Mayosi BM, Keavney B, Watidns H, Farrall M (2003) Measured haplotype 
analysis of the aldosterone synthase gene and heart size. Eur J Hum Genet 1 1 : 
395-401. 

50. NicodJ, Richard A, Erey FJ, Ferrari P (2002) Recipient R,\S gene variants and 
renal allograft function. Transplantation 73: 960—965. 

51. Wu J, Jiang R (2013) Prediction of deleterious nonsynonymous single-nucleotide 
polymorphism for human diseases. ScientificWorldJoumal 2013: 675851. 



PLOS ONE I www.plosone.org 



14 



August 2014 I Volume 9 | Issue 8 | e104311 



